Search CORE

Texas A&M Repository

Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery

Author: Jason H. Moore
John F. Bowyer
Nathaniel M. Crabtree
Nysia I. George
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2017
Field of study

Abstract Background A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. Results The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. Conclusion The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features

Dietary Iodine Sufficiency and Moderate Insufficiency in the Lactating Mother and Nursing Infant: A Computational Perspective.

Author: Eva D McLanahan
Jeffery M Gearhart
Jian Wang
Nysia I George
W Fisher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

The Institute of Medicine recommends that lactating women ingest 290 μg iodide/d and a nursing infant, less than two years of age, 110 μg/d. The World Health Organization, United Nations Children's Fund, and International Council for the Control of Iodine Deficiency Disorders recommend population maternal and infant urinary iodide concentrations ≥ 100 μg/L to ensure iodide sufficiency. For breast milk, researchers have proposed an iodide concentration range of 150-180 μg/L indicates iodide sufficiency for the mother and infant, however no national or international guidelines exist for breast milk iodine concentration. For the first time, a lactating woman and nursing infant biologically based model, from delivery to 90 days postpartum, was constructed to predict maternal and infant urinary iodide concentration, breast milk iodide concentration, the amount of iodide transferred in breast milk to the nursing infant each day and maternal and infant serum thyroid hormone kinetics. The maternal and infant models each consisted of three sub-models, iodide, thyroxine (T4), and triiodothyronine (T3). Using our model to simulate a maternal intake of 290 μg iodide/d, the average daily amount of iodide ingested by the nursing infant, after 4 days of life, gradually increased from 50 to 101 μg/day over 90 days postpartum. The predicted average lactating mother and infant urinary iodide concentrations were both in excess of 100 μg/L and the predicted average breast milk iodide concentration, 157 μg/L. The predicted serum thyroid hormones (T4, free T4 (fT4), and T3) in both the nursing infant and lactating mother were indicative of euthyroidism. The model was calibrated using serum thyroid hormone concentrations for lactating women from the United States and was successful in predicting serum T4 and fT4 levels (within a factor of two) for lactating women in other countries. T3 levels were adequately predicted. Infant serum thyroid hormone levels were adequately predicted for most data. For moderate iodide deficient conditions, where dietary iodide intake may range from 50 to 150 μg/d for the lactating mother, the model satisfactorily described the iodide measurements, although with some variation, in urine and breast milk. Predictions of serum thyroid hormones in moderately iodide deficient lactating women (50 μg/d) and nursing infants did not closely agree with mean reported serum thyroid hormone levels, however, predictions were usually within a factor of two. Excellent agreement between prediction and observation was obtained for a recent moderate iodide deficiency study in lactating women. Measurements included iodide levels in urine of infant and mother, iodide in breast milk, and serum thyroid hormone levels in infant and mother. A maternal iodide intake of 50 μg/d resulted in a predicted 29-32% reduction in serum T4 and fT4 in nursing infants, however the reduced serum levels of T4 and fT4 were within most of the published reference intervals for infant. This biologically based model is an important first step at integrating the rapid changes that occur in the thyroid system of the nursing newborn in order to predict adverse outcomes from exposure to thyroid acting chemicals, drugs, radioactive materials or iodine deficiency

Public Library of Science (PLOS)

An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data.

Author: Ching-Wei Chang
John F Bowyer
Nathaniel M Crabtree
Nysia I George
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data

Low-Frequency Mutational Heterogeneity of Invasive Ductal Carcinoma Subtypes: Information to Direct Precision Oncology

Author: Barbara L. Parsons
Karen L. McKim
Malathi Banda
Meagan B. Myers
Nysia I. George
Publication venue: 'MDPI AG'
Publication date: 01/02/2019
Field of study

Information regarding the role of low-frequency hotspot cancer-driver mutations (CDMs) in breast carcinogenesis and therapeutic response is limited. Using the sensitive and quantitative Allele-specific Competitor Blocker PCR (ACB-PCR) approach, mutant fractions (MFs) of six CDMs (PIK3CA H1047R and E545K, KRAS G12D and G12V, HRAS G12D, and BRAF V600E) were quantified in invasive ductal carcinomas (IDCs; including ~20 samples per subtype). Measurable levels (i.e., ≥ 1 × 10−5, the lowest ACB-PCR standard employed) of the PIK3CA H1047R, PIK3CA E545K, KRAS G12D, KRAS G12V, HRAS G12D, and BRAF V600E mutations were observed in 34/81 (42%), 29/81 (36%), 51/81 (63%), 9/81 (11%), 70/81 (86%), and 48/81 (59%) of IDCs, respectively. Correlation analysis using available clinicopathological information revealed that PIK3CA H1047R and BRAF V600E MFs correlate positively with maximum tumor dimension. Analysis of IDC subtypes revealed minor mutant subpopulations of critical genes in the MAP kinase pathway (KRAS, HRAS, and BRAF) were prevalent across IDC subtypes. Few triple-negative breast cancers (TNBCs) had appreciable levels of PIK3CA mutation, suggesting that individuals with TNBC may be less responsive to inhibitors of the PI3K/AKT/mTOR pathway. These results suggest that low-frequency hotspot CDMs contribute significantly to the intertumoral and intratumoral genetic heterogeneity of IDCs, which has the potential to impact precision oncology approaches

Assessing Sex Differences in the Risk of Cardiovascular Disease and Mortality per Increment in Systolic Blood Pressure: A Systematic Review and Meta-Analysis of Follow-Up Studies in the United States

Author: Ching-Wei Chang (30216)
Karen A. Hicks (3697384)
Nysia I. George (749154)
Yu-Chung Wei (3697387)
Publication venue
Publication date: 25/01/2017
Field of study

<div>In the United States (US), cardiovascular (CV) disease accounts for nearly 20% of national health care expenses. Since costs are expected to increase with the aging population, informative research is necessary to address the growing burden of CV disease and sex-related differences in diagnosis, treatment, and outcomes. Hypertension is a major risk factor for CV disease and mortality. To evaluate whether there are sex-related differences in the effect of systolic blood pressure (SBP) on the risk of CV disease and mortality, we performed a systematic review and meta-analysis. We conducted a comprehensive search using PubMed and Google Scholar to identify US-based studies published prior to 31 December, 2015. We identified eight publications for CV disease risk, which provided 9 female and 8 male effect size (ES) observations. We also identified twelve publications for CV mortality, which provided 10 female and 18 male ES estimates. Our meta-analysis estimated that the pooled ES for increased risk of CV disease per 10 mmHg increment in SBP was 25% for women (95% Confidence Interval (CI): 1.18, 1.32) and 15% for men (95% CI: 1.11, 1.19). The pooled increase in CV mortality per 10 mm Hg SBP increment was similar for both women and men (Women: 1.16; 95% CI: 1.10, 1.23; Men: 1.17; 95% CI: 1.12, 1.22). After adjusting for age and baseline SBP, the results demonstrated that the risk of CV disease per 10 mm Hg SBP increment for women was 1.1-fold higher than men (P<0.01; 95% CI: 1.04, 1.17). Heterogeneity was moderate but significant. There was no significant sex difference in CV mortality.</div

Characteristics of CV disease risk studies.

Author: Ching-Wei Chang (30216)
Karen A. Hicks (3697384)
Nysia I. George (749154)
Yu-Chung Wei (3697387)
Publication venue
Publication date
Field of study

Characteristics of CV disease risk studies.</p

Sex-specific and overall effect sizes (ES) for CV mortality per 10 mm Hg increment in SBP.

Author: Ching-Wei Chang (30216)
Karen A. Hicks (3697384)
Nysia I. George (749154)
Yu-Chung Wei (3697387)
Publication venue
Publication date
Field of study

ES observations are ordered by baseline SBP values. The corresponding ES IDs are listed in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170218#pone.0170218.t002" target="_blank">Table 2</a>.</p

Number of features with 0 through 4 detected outliers in the control group of rat RNA-seq data.

Author: Ching-Wei Chang (30216)
John F. Bowyer (749155)
Nathaniel M. Crabtree (749156)
Nysia I. George (749154)
Publication venue
Publication date
Field of study

Number of features with 0 through 4 detected outliers in the control group of rat RNA-seq data.</p